The accuracy of statistical confidence estimates in shotgun proteomics
نویسندگان
چکیده
High-throughput techniques are currently some of the most promising methods to study molecular biology, with the potential to improve medicine and enable new biological applications. In proteomics, the large scale study of proteins, the leading method is mass spectrometry. At present researchers can routinely identify and quantify thousands of proteins in a single experiment with the technique called shotgun proteomics. A challenge of these experiments is the computational analysis and the interpretation of the mass spectra. A shotgun proteomics experiment easily generates tens of thousands of spectra, each thought to represent a peptide from a protein. Due to the immense biological and technical complexity, however, our computational tools often misinterpret these spectra and derive incorrect peptides. As a consequence, the biological interpretation of the experiment relies heavily on the statistical confidence that we estimate for the identifications. In this thesis, I have included four articles from my research on the accuracy of the statistical confidence estimates in shotgun proteomics; how to accomplish it and evaluate it. In the first two papers a new method to use pre-characterized protein samples to evaluate this accuracy is presented. The third paper deals with how to avoid statistical inaccuracies when using machine learning techniques to analyze the data. In the fourth paper, we present a new tool for analyzing shotgun proteomics results, and evaluate the accuracy of its statistical estimates using the method from the first papers. The work I have included here can facilitate the development of new and accurate computational tools in mass spectrometry-based proteomics. Such tools will help making the interpretation of the spectra and the subsequent biological conclusions more reliable. c ©Viktor Granholm, Stockholm 2014, pages 1–51 ISBN 978-91-7447-787-0 Printed in Sweden by US-AB, Stockholm 2014 Distributor: Department of Biochemistry and Biophysics, Stockholm University List of publications I have included the following articles in the thesis. PAPER I: On using samples of known protein content to assess the statistical calibration of scores assigned to peptide-spectrum matches in shotgun proteomics. Viktor Granholm, William Stafford Noble & Lukas Käll Journal of Proteome Research, 10(5), 2671–2678 (2011). DOI: 10.1021/pr1012619 PAPER II: Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics. Viktor Granholm, José Fernández Navarro, William Stafford Noble & Lukas Käll Journal of Proteomics, 80(0), 123–131 (2013). DOI: 10.1016/j.jprot.2012.12.007 PAPER III: A cross-validation scheme for machine learning algorithms in shotgun proteomics. Viktor Granholm, William Stafford Noble & Lukas Käll BMC Bioinformatics, 13(Suppl 16), S3 (2012). DOI: 10.1186/1471-2105-13-S16-S3 PAPER IV: Fast and accurate database searches with MS-GF+Percolator Viktor Granholm, Sangtae Kim, José Fernández Navarro, Erik Sjölund, Richard Smith & Lukas Käll Journal of Proteome Research, 13(2), 890–897 (2014). DOI: 10.1021/pr400937n The articles are printed here with permission from the respective publishers. Other publications that are not included in the thesis. Quality assessments of peptide-spectrum matches in shotgun proteomics Viktor Granholm & Lukas Käll Proteomics, 11(6), 1086–1093 (2011). DOI: 10.1002/pmic.201000432 Mass fingerprinting of complex mixtures: protein inference from highresolution peptide masses and predicted retention times Luminita Moruz, Michael Hoopmann, Magnus Rosenlund, Viktor Granholm, Robert Moritz & Lukas Käll Journal of Proteome Research, 12(12), 5730–5741 (2013). DOI: 10.1021/pr400705q Membrane protein shaving with thermolysin can be used to evaluate topology predictors Maria Bendz, Marcin Skwark, Daniel Nilsson, Viktor Granholm, Susana Cristobal, Lukas Käll & Arne Elofsson Proteomics, 13(9), 1467–1480 (2013). DOI: 10.1002/pmic.201200517 HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics Rui Branca, Lukas Orre, Henrik Johansson, Viktor Granholm, Mikael Huss, Åsa Pérez-Bercoff, Jenny Forshed, Lukas Käll & Janne Lehtiö Nature Methods, 11(1), 59–62 (2014). DOI: 10.1038/nmeth.2732
منابع مشابه
Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics
Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence e...
متن کاملDetermining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics.
The analysis of a shotgun proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statist...
متن کاملEffect of Laparoscopic Gastric Plication on the Blood Protein Profile of Obese Subjects Using Shotgun Proteomics
Introduction: Nowadays, bariatric surgery is considered to be the most effective technique in the treatment of morbid obesity. In the current study, the effect of Laparoscopic Gastric Plication (LGP), a new technique, on the serum protein profile of obese patients has been investigated following surgery. Materials and Methods: Serum of 16 obese subjects with mean body mass index (BMI) of 41.2±5...
متن کاملDesign and Validation of Proteome Measurements
Proteomics is a branch in biology that aims to comprehensively characterize a proteome. Mass spectrometry based proteomics has proven to be the most powerful approach to achieve this goal. This thesis introduces statistical concepts to optimally design and validate shotgun proteomics experiments and thereby enables to efficiently achieve reliable and extensive proteome coverage. The first part ...
متن کاملAddressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004